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Abstract. This paper presents a new hybrid learning algorithm for un- 
supervised classification tasks. We combined Fuzzy c-means learning al- 
gorithm and a supervised version of Minimerror to develop a hybrid 
incremental strategy allowing unsupervised classifications. Wc applied 
this new approach to a real-world database in order to know if the in- 
formation contained in unlabeled features of a Geographic Information 
System (GIS), allows to well classify it. Finally, we compared our results 
to a classical supervised classification obtained by a multilayer percep- 
tron. 
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1 Supervised and Unsupervised Learnings 

For a classification task, the learning is supervised if the labels of the classes of 
the input patterns arc given a priori by a professor. A cost function calculates the 
difference between desired and real outputs produced by a network, then, this 
difference is minimized modifying the network's weights by a learning rule. A 
supervised learning set C is constitued by P couples (^^, r^). /i = 1, P, where 
is the input pattern /x and = ±1 its class. ^'^ is a A^-dimension vector, 
with numeric or categoric values. If labels are not present in £, it may be 
used as unsupervised learning. Learning is unsupervised when the object's class 
is not known in advance. This learning is performed by extraction of intrinsic 
regularities of patterns presented to the network. The number of neurons of 
the output layer corresponds to the desired number of categories. Therefore, 
the network develops its own representation of input patterns, retaining the 
statistically redundant traits. 



2 Supervised Minimerror 



Minimerror algorithm [1] performs correctly in binary problems of high dimen- 
sionality [3, 4, 10]. The supervised version of Minimerror performs a binary clas- 
sification using the minimization of the cost function: 
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with 



V{x) = 1 - tanh(x) (2) 

Temperature T defines an effective window width on both sides of the separat- 
ing hyperplane defined by w. The derivative ^^^^^ is vanishingly small outside 
this window. Therefore, if the minimum cost (1) is searched through a gradient 
descent, only the patterns at a 

\w ■ 

|y.| = I 1 I < 2T (3) 

Vn 

distance will contribute significantly to learning [1,2]. Minimerror algorithm im- 
plements this minimization starting at high temperature. The weights are ini- 
tializod with Hebb's rule, which is the minimum of (1) in the high temperature 
limit. Then, T is slowly decreased upon the successive iterations of the gradient 
descent by a deterministic annealing, so that only the patterns within the nar- 
rowing window of width 2T are effectively taken into account for calculating the 
correction 

Sw = -e 4 
ow 

at each time step, where e is the learning rate. Thus, the search of the hyperplane 
becomes more and more local as the number of iterations increases. In practical 
implementations, it was found that convergence is considerably speeded-up if 
patterns already learned are considered at a lower temperature Tl than the 
not learned ones, Tl < T. Minimerror algorithm has three free parameters: the 
learning rate e of the gradient descent, the temperature ratio T^/T, and the 
annealing rate 5T at which temperature is decreased. At convergence, a last 
minimization with Tj^ = T is performed. This algorithm has been coupled with 
a incremental heuristics, NetLS [2,5], which adds neurons in one hidden layer 
as learning function. Several results [2-4] show that NetLS is very powerful and 
gives small generalization errors comparable to other methods. 

3 Unsupervised Minimerror 

A variation of Minimerror, Minimerror-S [2,3], allows to obtain spherical sepa- 
rations on input's space. The spherical separation used the same cost function 
(1), but a spherical stability 7s is defined by: 

7a = ||tr-^||-p' (5) 



where p is a hyperspherical's radius centered on w. The pattern's class is r = — 1 
inside the sphere and r = 1 elsewhere. Spherical separations make it possible 
to consider unsupervised learning using the Minimerror's separating qualities. 
Thus, a strategy of unsupervised growing was developed in Loria. The algorithm 
starts by obtaining the distances between the patterns. The Euclidean distance 
can be used to calculate them. Once the established distances, we started to 
find the pair fi and v of patterns with the smallest distance p. This creates the 
first incremental kernel. We located the hypersphere's center Wq at the middle 
of patterns ji et v: 

Wo = ' (6) 

The initial radius is fixed 

to make enter a certain number of patterns in growing kernel. Then, patterns are 
labeled r = — 1 if they are inside or in the border of the initial sphere, and r = 1 
if elsewhere. Minimerror-S finds the hypersphere {p*,w*} that better separates 
patterns. The internal representations are cr = — 1 if 



< 
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else (7=1. This makes it possible to check if there are patterns with t = 1 
outside but sufficiently close to the sphere In this case, then it makes 

T = — 1 for these patterns and it learns them again, repeating the procedure for 
all patterns of C. At this time, it passes to another growing kernel which will 
form a second class W2, calculating with Minimerror-S {p2,W2), and repeating 
the procedure until there is no more patterns to classify. Finally it obtains K 
classes. A pruning procedure can avoid having too many classes by eliminating 
those with few elements (less than one number fixed in advance). It is possible to 
introduce conditions at the border, which are restrictions that prevent locating 
the hypersphere center outside of the input's space. For certain problems this 
strategy can be interesting. These restrictions are however optional: if it makes 
too many learning errors, the algorithm decides to neglect them and the center 
and radius of separating spheres can diverge. 



4 The Unsupervised Algorithm Fuzzy c-means 

This algorithm [6, 7] allows us to obtain a clusterisation of patterns with a fuzzy 
approach. Fuzzy c-means minimizes the sum of the squared errors with the 
following conditions: 

c n 

^m^fc = 1;^ Wife > 0;mife e 0,1 (8) 

/c=l 1=1 

i = 1,2, ...,n;fc = 1,2, ...,c (9) 



The objective function is defined by 



J-T.J2'^tkd\^i,Ck) (10) 

i=l fc=l 

where n is the number of patterns, c is the desired number of classes, Ck is the 
centroid vector of class K, is a pattern i and cP{£,i,Ck) is the square of the 
distance between patterns and c^, in agreement with a definition of unspecified 
distance, which to simplify, we will indicate by Cfc). ^ is a fuzzy parameter, 

a value in [2,oo), which determines the fuzzyfication of the final solution, i.e., 
it controls the overlapping between the classes. If </> = 1, the solution is a hard 
partition. If ^ — » oo the solution approaches the maximum of fuzzyfication and 
all the classes are likely to merge in only one. The minimization of the objective 
function J provides the solution for the membership function (6): 
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rriik = 3 , .^^ ; i = 1, ■ ■ ■ , n; fc = 1, . ■ ■ , c; (11) 



where: 

ck= %t""'f ;k = i,...,c (12) 

The fuzzy c-means algorithm is: 

1. Let the class number k, with 1 < k < n. 

2. Let a value of fuzzy parameter / > 2. 

3. To clioix a suitable distance definition in input's space. That may be eu- 
clidean distance and then d'^{xi,Ck) = \ \xi — Ck\\^ ■ 

4. To choix a value for stop criterium e (e = 0.001 is a suitable convergence). 

5. Let M = Mq, for pattern with random values or with values from a hard 
partition of k- means. 

6. In iteration t = 1,2,3,... (re) calculate C = Ct using 12 and Mf-i. 

7. Rc-calculate M = Mt using equation 10 and Ct- 

8. To compare Mt and Mt-i with a suitable matrix norme. If ||Mt — Mt_i|| < e 
then stop else go to 6. 



5 A Hybrid Strategy 

In spite of the supervised Minimerror's simplicity, the number of classes obtained 
is sometimes too high. Thus, wc chose a combined strategy: a first unsupervised 
hidden layer calculates the centroids with Fuzzy c-means algorithm. As input we 
have P unlabeled patterns of learning set C. Then Supervised Minimerror finds 
spherical separations well adapted to maximize the stability of the patterns. The 
input is the same C set, but labeled by Fuzzy c-means. In this way, the number 
of classes can be selected in advance. 



6 Deposit Prospection Experiment 

The mineral resources division of the French geological survey (BRGM [8]) devel- 
ops continent-scale Geographic Information System (GIS), which support metal- 
logenic research. This difficult real-world problem constitutes a tool for decision 
making. The understanding of the formation of metals such as gold, copper or 
silver is not good enough and a lot of patterns describing a site are available 
including the size of the deposit for various metals. In this study, we will focus 
on a GIS which covers all the Andes and two classes : deposit and barren. A 
deposit is an economically exploitable mineral concentration [9]. The concentra- 
tion factor corresponds to the rate of enrichment in a chemical element, i.e. to 
the relationship between its average content of exploitation and its abundance in 
the earth's crust. Geologists oppose to the concept of deposit the one of barren. 
Actually, for the interpretation of the results of generalization, it is necessary to 
enter the number of sites well classified in each category to be able to answer the 
question: Is this a deposit or a barren ? In our study, a deposit will be defined as 
a site (represented by a pattern) that contains at least one metal and a barren by 
a site without any metal. Then, the classes deposit and barren will be used from 
now on. The database we used contains 641 patterns, 398 examples of deposits 
and 343 examples of barrens. 

6.1 Study of the Attributes 

The original databases have 25 attributes, 8 qualitative and 17 quantitative, such 
as the position of a deposit, the type and age of the country rock hosting the 
deposit, the proximity of the deposit to a fault zone distinguished by its orien- 
tation in map view, density and focal depth of earthquakes immediately below 
the deposit, proximity of active volcanoes, geometry of the subduction zone etc. 
Wc made a statistical study to determine the importance of each variable. We 
calculated for each attribute the average of deposit and barren patterns, in order 
to determine which attributes were relevant for discriminating the patterns (fig- 
ure 1). There are some attributes (15, 16, 17 or 22, among others) that are not 
relevant. On the other hand, the attributes 3, 5, 6 and 25 are rather discriminat- 
ing. It is interesting to know how the choice of attributes influences the learning 
and specially the generalization tasks. Therefore, we created 11 databases with 
different combinations of attributes. Table 1 shows the number of qualitative 
and quantitative attributes, and the dimension for each database used. 

6.2 Data Preprocessing and deposit/barren Approach 

The range of the attributes is extremely broad. In order to homogenize them, 
a standardization of quantitative attributes is suitable. A data preprocessing is 
needed for the correct functioning of the neural network. Thus, for each continu- 
ous variable, the standardization calculates the average and standard deviation. 
Then, the variable was centered and the values divided by the standard devia- 
tion. The qualitative attributes are not modified. The standardized corpus was 
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Fig. 1. Mean squared differences of the average patterns. 



Database 


Attributes Used 


Qual. 


Quant. 


N 


I 


1 to 25 


8 


17 


25 


II 


1 to 8 


8 





8 


III 


9 to 25 





17 


17 


IV 


11,12,13,14 





4 


4 


V 


11,12,13,25 





4 


4 


VI 


3,5,6,7 


4 





4 


VII 


11,12,13,14,25 





5 


5 


VIII 


11,12,13,20,25 





5 


5 


IX 


3,5,6,7,11,12,13,25 


4 


4 


8 


X 


11,12,13,14,18,19,20,21,23,24 





10 


10 


XI 


11,12,13,14,18,19,20,21,23,24,25 





11 


11 



Table 1. Andes GIS learning databases used. 



divided in learning and test sets. Tiie sets consist of randomly selected patterns 
from the whole corpus. Learning sets of 10% (64 patterns) to 95% (577 patterns) 
of the original database (641 patterns) were generated. The complement was se- 
lected as test set. There are N input neurons in the network, depending on the 
database dimension. The unsupervised part of the network, Fuzzy c-means, must 
find two classes: deposit and barren. Minimerror will find the best hyperspherical 
separator for each class. In the same condition, a multilayer pcrccptron with 10 
neurons on a single hidden layer obtains up to 77% of correct classification. 



7 Results 



Classification performance corresponded to the percentage of well classified sit- 
uations. Learning and generalization discrimination of deposit and barren were 



obtained for all learning databases. Database VII (including only few quantita- 
tive attributes) had the best learning and generalization performances in com- 
parison to the other databases. When using all the attributes, the performances 
fell. Figure 2 shows some results of this behavior. Based on this information, 
we kept this database to perform 100 random tests. The capacity of discrimina- 
tion between deposit and barren, according to the percentage of learned patterns 
is shown in figure 3. The deposit class detection is quite higher than the barren 
class. We note that the detection of gold, argent and copper remain quite precise, 
bet, that of the molybdenum is rather poor. This can be explained according to 
the weak presence of this metal. 



80 





Fig. 2. Generalization performances according to the learning set size obtained by the 
hybrid model with various databases. 



8 Conclusion 



We developed a variation of Minimcrror for unsupervised classification with liy- 
perspherical separations. The hybrid combination of Minimerror and Fuzzy c- 
means proved to be the most promising. This strategy applied to real-world 
database, allowed us to predict in a rather satisfactory way if a site could be 
identified or not as a deposit. The 75% value obtained for the well classified pat- 
terns with this unsupervised/supervised algorithm is comparable to the values 
obtained with other classical supervised methods. This also shows the discrimi- 
nating capacity of the descriptive attributes that we selected as the most suitable 
for this two-class problem. Finally, according to the figure 3, we should be able 
to obtain a significant improvement of the performance just increasing the num- 
ber of examples. Additional studies must be made to determine more accurately 
other relevant attributes, as well as to perform hybrid learning multi-class tasks. 




Fig. 3. deposit /barren discrimination performances in generalization according to the 
learning set size (100 tests) obtained by the hybrid model with the database VII. 
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